Guided part

1:

Read in the gapminder_clean.csv data as a tibble using read_csv.

2:

Filter the data to include only rows where Year is 1962 and then make a scatter plot comparing ‘CO2 emissions (metric tons per capita)’ and gdpPercap for the filtered data.

We can see that, as expected, there was a very high, positive correlation between CO2 emissions and GDP per capita in 1962.

3:

On the filtered data, calculate the pearson correlation of ‘CO2 emissions (metric tons per capita)’ and gdpPercap. What is the Pearson R value and associated p value?

The spearman correlation coefficient is `0.9128636`.

4:

On the unfiltered data, answer “In what year is the correlation between ‘CO2 emissions (metric tons per capita)’ and gdpPercap the strongest?” Filter the dataset to that year for the next step…

The year with the strongest correlation between GDP per capita and CO2 emissions per capita is `2002`, with a Spearman correlation coefficient of `0.94`. We can observe the trend in the following figure:

5:

Using plotly, create an interactive scatter plot comparing ‘CO2 emissions (metric tons per capita)’ and gdpPercap, where the point size is determined by pop (population) and the color is determined by the continent. You can easily convert any ggplot plot to a plotly plot using the ggplotly() command.

We can see that African, American and Asian countries are at the bottom in both emissions and GDP per capita. European countries, with some exception, are more rich and carbonic anhydride emitting.The two American outliers must be Canada and the United States of America.

Unguided part

1:

What is the relationship between continent and ‘Energy use (kg of oil equivalent per capita)’? (stats test needed)

Before applying a statistical test we will visualize the distribution of energy use in each continent through the use of boxplots. We have used only the values for 2007 for the amount of outliers to be readily interpretable. You can hover your mose over a point in the interactive graph to see which country it is representing.

Europe and Oceania are the most energy consuming countries, followed by Asia and the Americas. Asian outliers are rich countries, due to oil or otherwise. The three clear American outliers are the two rich North American countries and the industry intensive Trinidad & Tobago. Africa presents the lower values, with South Africa and resource rich Lybia and Equatorial Guinea as outliers.

If the data are normally distributed and homoscedastic, an ANOVA test would be appropriate. Let’s check those assumptions.

Only Oceania seems to be normally distributed. The null hypothesis (normality) is rejected. We shall thus use the Kruskal-Wallis test.

The null hypothesis (equality of means) `is rejected`.

This test only refutes the hypothesis that all means are equal but doesn’t tell us if the difference between two specific continents is significative. In order to ascertain that we used a Dunn test, which showed that all differences were significant except for the differences between Asia and the Americas and Europe and Oceania.

2:

Is there a significant difference between Europe and Asia with respect to ‘Imports of goods and services (% of GDP)’ in the years after 1990? (stats test needed)

As in the previous section, we will visualize before applying a statistical test:

In the graph we can see that mean European country imports as % of GDP have been growing faster than mean Asian country imports as % of GDP, almost converging in 2007.

We will plot an histogram in order to see whether there is obvious non-normality.

Data for Europe is clearly not normally distributed, so we will use the Wilcoxon non-parametric test.

The null hypothesis (equality of mean imports as a % of GDP between Europe & Asia) is `Accepted`.

3:

What is the country (or countries) that has the highest ‘Population density (people per sq. km of land area)’ across all years? (i.e., which country has the highest average ranking in this category across each time point in the dataset?)

`Macao SAR, China` was the country with a highest average population density ranking.

4:

What country (or countries) has shown the greatest increase in ‘Life expectancy at birth, total (years)’ since 1962?

The country featuring the greatest increase of life expectancy at birth in the 1962-2007 period was `Maldives`, with an increase of `36.9161463` years.